Character segmentation using visual interword constraints in a text page
نویسندگان
چکیده
Character segmentation is a critical preprocessing step for text recognition. In this paper a method is presented that utilizes visual inter-word constraints available in a text image to split word images into smaller image pieces. This method is applicable to machine-printed texts in which the same spacing is always used between identical pairs of characters. The visual inter-word constraints considered here include information about whether a word image is a sub-image of another word image. For example, given two word images A and B, which are "mathematical" and "the". If the short word image B is found to be a sub-image of the long word image A, the longer image A is split into three pieces, A 1 , A 2 and A 3 , where A 2 matches B, A 1 corresponds to "ma", and A 3 corresponds to "matical". The image piece A 1 can be further used to split A 3 into two parts, "ma" and "tical". This method is based purely on image processing using the visual context in a text page. No recognition is involved.
منابع مشابه
A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملThe role of interword spacing in reading Japanese: An eye movement study
The present study investigated the role of interword spacing in a naturally unspaced language, Japanese. Eye movements were registered of native Japanese readers reading pure Hiragana (syllabic) and mixed Kanji-Hiragana (ideographic and syllabic) text in spaced and unspaced conditions. Interword spacing facilitated both word identification and eye guidance when reading syllabic script, but not ...
متن کاملCherry Blossom: A System for Japanese Character Recognition
A general purpose Japanese character recognition system, Cherry Blossom, has been developed at CEDAR in past years. It is designed to recognize Japanese document images in low resolution or with poor print quality. The system includes modules for page skew correction, document segmentation, text segmentation, character recognition and postprocessing. The API code for each module has been develo...
متن کاملSegmenting Text Images With Massively Parallel Machines
Image segmentation, the partitioning of an image into meaningful parts, is a major concern of any computer vision system. The meaningful parts of a text image are lines of text, words and characters. In this paper, the segmentation of pages of text into lines of text and lines of text into characters on a parallel machine will be examined. Using a parallel machine for text image segmentation al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995